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SYNTHETIC INSECTICIDAL CRYSTAL PROTEIN GENE 


FIELD OF THE INVENTION 

This invention relates to the field of bacterial molecular biology and. in particular, to genetic engineering 
by recombinant technology for the purpose of protecting plants from insect pests. Disclosed herein are the 
chemical synthesis of a modified crystal protein gene from Bacillus thuringiensis var. tenebrionis (Btt). and 
the selective expression of this synthetic insecticidal gene. Also disclosed is the transfer of th^cloned 
synthetic gene into a host microorganism, rendering the organism capable of producing, at Improved levels 
of expression, a protein having toxicity to insects. This invention . facilitates the genetic engineering of 
bacteria and plants to attain desired expression levels of novel toxins having agronomic value. 


BACKGROUND OF THE INVENTION 

B. thuringiensis (8t) is unique in its ability to produce, during the process of sporulation, proteinaceous. 
crystalline inclusions which are found to be highly toxic to several insect pests of agricultural importance. 
The crystal proteins of different Bt strains have a rather narrow host range and hence are used 
commercially as very selective biological, insecticides. Numerous strains of Bt are toxic to lepidopteran and 
dipteran insects. Recently two subspecies (or varieties) of Bt have been reported to be pathogenic to 
coleopteran insects: var. tenebrionis (Krieg et al. (1983) Z. Angew. Entomol. 96:500-508) and var. san diego 
(Herrnstadt et al. (1986) Biotechnol, 4:305-308). Both strains produce flat, rectangular crystal inclusionsand 
have a major crystal component of 64-68 kDa (Herrnstadt et al. supra ; Bernhard (1986) FEMS Microbiol. 
Lett. 33:261-265). ~ ' 

Toxin genes from several subspecies of Bt.have been cloned and the recombinant clones were found to 
be toxic to lepidopteran and dipteran insect larvae. The two coleopteran-active toxin genes have also been 
isolated and expressed. Herrnstadt et al. supra clones a 5.8 kb Bam HI fragment of Bl var. san diego DNA. 
The protein expressed in E. coli was toxic to P. luteola (Ell^ leaf beetle) and had~a molecuiar weight of 
approximately 83 kDa. This 83 kOa toxin product from the var. san diego gene was larger than the 64 kDa 
crystal protein isolated from Bt-var. san . diego cells, suggesting that the Bt var. san diego crystal protein 
may be synthesized as a larger precursor molecule that is processed by Bt var. san diego but not by E. coli 
prior to being formed into' a crystal. ~ 

Sekar et al. (1987) Proc. Nat. Acad. Sci. USA 84:7036-7040; U.S. Patent Application 108.285. filed 
October 13. 1987 isolated the crystal protein gene from Btt and determined the nucleotide sequence. This 
crystal protein gene' was contained on a 5.9 kb Bam HI fragment (pNSBF544), A subclone containing the 3 
kb Hind lll fragment from pNSBF544 was constructed. This Hind lH fragment contains an open reading frame 
(ORF) that encodes a 644-amino acid polypeptide of approximately 73 kOa. Extracts of both subclones 
exhibited toxicity to larvae of Colorado potato beetle ( Leptjnotarsa decemlineata . a coleopteran insect). 73- 
and 65-kDa peptides that cross-reacted with an antiserugi against the crystal protein of var. tenebrionis 
were produced on expression in E. coH. Sporulating var. tenebrionis cells contain an immunoreactive 73-kDa 
peptide that corresponds to the expected product from the ORF of pNSBP544, However, isolated crystals 
primarily contain a- 65-kOa component. When the crystal- protein gene was shortened at the N-terminal 
region, the' dominant protein product obtained .was the 65-kDa peptide. A deletion derivative. p544Pst-iVIet5. 
was epzymatically derived, from the 5.9 kb Bam HI fragmept upon removal of, forty-six amino acid residues 
from the N-terminus. Expression of the N-terminal deletion .derivative. p544Pst-Met5. resulted in -the 
production of. almost exclusively, the 65 kDa protein. Recently. McPherson et al. (1988) Biotechnology 
6:61-66 demonstrated that the Btt gene contains two functional translational initiation codons in the same 
reading frame leading to the production of both the full-length protein and an N-terminat truncated form. 

Chimeric toxin genes from several strains of Bt have been expressed in plants. Four modified Bt2 
genes from var. berliner 1715. under the control of the 2 promoter of the Agrobacterium TR-ONA. were 
transferred into tobacco plants (Vaeck et al. (1987) Nature 328:33-37). Insecticidal levels of toxin were 
produced when truncated genes were expressed in transgenic plants. However, the steady state mRNA 
levels in the transgenic plants were so low that they could not be reliably detected in Northern blot analysis 
and hence were quantified using ribonuclease protection experiments. Bt mRNA levels in plants producing 
the highest level of protein corresponded to «0.000l^'o of the poly(A)' mRNA. In the report by Vaeck et al. 
(1987) supra , expression of chimeric genes containing the entire coding sequence of Bt2 were compared "to : 
those containing truncated Bt2 genes. Additionally, some T-DNA constructs included aThimeric NPTII gene 
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as a marker selectable in plants, whereas other constructs carried translational fusions between fragments 
of Bt2 and the NPTII gene. Insecticidal levels of toxin were produced when truncated Bt2 genes or fusion 
constructs were expressed in transgenic plants. Greenhiouse grown plants produced -0.02% of the total 
soluble protein as the toxin, or 3ug of toxin per g. fresh leaf tissue and. even at five-fold lower levels. 
5 showed 100% mortality in six-day feeding assays. However, no significant insecticidal- activity could be 
obtained using the intact Bt2 coding sequence, despite the fact that the same promoter was used to direct 
its expression. Intact Bt2 protein and RNA yields in the transgenic plant leaves were 10-50 times lower 
than those for the truncated Bt2 polypeptides or fusion proteins. 

Barton et al. (1987) Plant Physiol. 85:1103-1109 showed expression of a Bt protein in a system 

w containing a 35S promoter, a viral (TMV) leader sequence, the Bt HD-1 4.5 kb gene~(encoding a 645 amino 
* acid protein followed by two proline residues) and a nopaline~synthase (nos) poly(A)+ sequence. Under 
these conditions expression was observed for Bt mRNA at levels up to 47* Dg/20ug RNA and 12 ng/mg 
plant protein. This amount of Bt protein in plant tissue produced 100% mortality in two days. This level of 
expression still represents a low level of mRNA (2.5 X 10**%) and protein (1.2 X 10*^%). 

;5 Various hybrid proteins consisting of N-terminat fragments of increasing length of the Bt2 protein fused 
to NPTII were produced in E. coli by Hofte et al. (1988) FEBS Lett. 226:364-370. Fusion proteins containing 
the first 607 amino acids of Bt2 exhibited insect toxicity; fusion proteins not containing this minimum N- 
terminal fragment were nontoxic. Appearance of NPTII activity was not dependent upon the presence of 
insecticidal activity; however, the conformation of the Bt2 polypeptide appeared to exert an important 

20 influence on the enzymatic activity of the fused NPTII protein. This study did suggest that the global 3-D 
structure of the Bt2 polypeptide is disturbed in truncated polypeptides. 

A number of researchers have attempted to express plant genes in yeast (Neill et al. (1987) Gene 
55:303-317; Rothstein et al. (1987) Gene 55:353-356; Coraggio et al. (1986) EMBO J. 5:459-465) and E. coli 
(Fuzakawa et al. (1987) FEBS Lett. 224:125-127; Vies et al. (ig'SB) EMBO J. 5;2439'2444; Gatenby'it'ir 

25 (1987) Eur. J. Biochem. 168:227-231). In the case of wheat a-gliadin (Neill et al. (1987) supra ), a-amyfase 
(Rothstein et al. (1987) supra ) genes, and maize zein genes (Coraggio et al. (1986) supra) in yeast, low 
levels of expression have been reported. Neill et al. have suggested that thelow levels of expression of a- 
gliadin in yeast may be due in part to codon usage bias, since a-gliadin codons for Phe, Leu. Ser. Gly. Tyr 
and especially GIU do not correlate well with the abundant yeast isoacceptor tflNAs. In E. colt however, 

30 soybean glycinin A2 (Fuzakawa et a). (1987) supra ) and wheat RuBPC SSU (Vies et al. ("1986) supra; 
Gatenby et al. (1987) supra) are expressed adequately. . 

Not much is known about the makeup of tRNA populations in plants. Viotti et al. (1978) Biochim. 
Biophys. Acta 5r7:i25-l32 report that maize endosperm actively synthesizing zein, altorage protein rich in 
glutamine, leucine, and alanine, is characterized by higher levels of accepting activity for these three amino 

35 acids than are maize embryo tRNAS. This may indicate that the tRNA population of specific plant tissues 
may be adapted for optimum translation of highly expressed proteins such as zein. To our knowledge, no 
one has experimentally altered codon bias in highly expressed plant genes to determine possible effects of 
the protein translation iri plants to check the effects on the level of expression. 


SUMfVlARY OF THE INVENTION 

It is the overall object of the present invention to provide a means for plant protection against insect 
damage. The invention disclosed herein comprises a chemically synthesized gene encoding an insecticidal 

J5 protein which is functionally equivalent to a native insecticidal protein of Bt. This synthetic gene is designed 
to be expressed in plants at a level higher than a native Bt gene. It is preferred that the synthetic gene be 
designed to be highly expressed in plants as defined herein. Preferably, the synthetic gene Is at least 
approximately 85% homologous to an insecticidal protein gene of Bt. 

It is a particular object of this invention to provide a synthetic structural gene coding for an insecticidal 

50 protein from Btt having, for example, the nucleotide sequences presented m Figure i and spanning 
nucleotides 1 through 1793 or spanning nucleotide 1 through 1833 with functional equivalence. 

In designing synthetic Btt genes of this Invention for erihanced expression in plants, the DNA sequence 
of the native Btt structural gene Is modified In order to contain codons preferred by highly expressed plant 
genes, to attain an A + T content in nucleotide base composition substantially that found in plants, and also 

55 preferably to form a plant initiation sequence, and to eliminate sequences that cause destabllizatlon. 
inappropriate polyadenylatton. degradation and termination of RNA and to avoid sequences that constitute 
secondary structure hairpins and RNA splice sites. In the synthetic genes, codons used to specify a given 
amino acid are selected with regard to the distribution frequency of codon usage employed in highly 
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expression. Hence, the synS oene is dl^„ '"h r?^'''' ' '^«'«™'"^t <he level of 

deviates, preferably, no more lh^^.25V^ L^^^^^^^ ""^"'"^y "< <='xion usage 

5 more than about 10% In addi 'o^ con id«rT.in„ 9""*' ""^^^ Preferably, no 

third base (monocotyledons r^e-^^^^^^^^ ~' °' ^^gene'rate 

recognized that the XCG nucleotide is Ihe leL^ ll JTr^' "'^"^y'^'O"' ""O^ " also 

avoided in both monocots and dicots T^e synmet c cetel of ,h °" "'"^''^ "^^ '^'^^ 

doubtet avoidance indices as defin d ir^theC e?Stion ^^^^^^^^^ also preferably have CG and TA 

0 host plant. More preferably these indices deviate fr/m ,h!?^f ! ^ ^ approximating those of the chosen 

Assembly of the St gene o ^s nventron i/olrf«rm!l "° '^-'S"- 

Btt structural gene delig'ned for Irnctrpres's o^^^ ^^-e 
assembled within a DNA vector from rhamirJi ? «""'0<liment is enzymatically 

synthetic 8. gene is then in, odld nTo a pZ^^^^^^ oligonucleotide duplex segments. The 

1 insecticidaTprotein produced upon exoresston oT.l . !h V""''"*'^ '<"°"" *e art. The 
a native B. crystal protein in haS^SyT thl same^ec^^^ '""'"""^"^ '° 

BHIEF DESCRIPTION OF THE FIGURES 

sepuencr;: LuT: ps^PsTSs zrr; '^:3r ^ 

synthetic sequence with alanine replacing reole a, residue 2 '"^ 
596 followed by the addition of n-amino acids arme C-lerminus ''"^ ^' 

A .hrou?h"M rerorX^ntSdVprs^ra^^^^^ °' 5« ^-e. Segments 

unique splice sites to allow speci«crzy™Bc assemS oT^hl DNA '"^'^ '° '""^ ONA duplexes having 

Figure 3 is a schematic diagram showt,r,he a emWv of l"''"^ " 
tion of a synthetic Btt gene. Each seqment W thrl.nh ^ ? "^o"^'™":- 
annealed and ligated-o form the des.^d DNA segment ' °«9°"-'«°«des of different si^es, 

DETAILED DESCRIPTION OF THE INVENTION 

a in .hX°Sr:nd c^mr '^""'^'^ " "^'''^ ^'^''^ " '° or scope o, their usage 

than the corresponding native Bt genes Tst^ be aoor^^^^^^^^ ' "^"^^ ^'^^ ' 

expression levels are affected Fy me requfa.o ! DNA " 9«"« 

' cers. etc.) employed and by me hos, le Mn wh^h T ^"'^^'^^"y'ation sites, enhan- 

synthetic B, gene expression and nat" at Sene e^^ ^""""''-^ °' 

sequences-and in the same host cell r^T!,^^^! ! . employing analogous regulatory 

expression must be employed in such comparisons ^"^'"^""^ '"^^^ °' Q-e 

Of .rSS"p::l:r^Tq^n<;:: "rntrsf f °' ^ '-^-^^ ^'^ ^ni^-^on 

downstream gene. In Jokarycles J oromo^^^^ '° °' ^ 

polymerases Lo.herSra d^,roT ,ot^^^^ ^'"^-g sites to RNA 

the downstream direction althouoh oroZunr!, V.. !^^^^^ 

expression, when the gene is placed uTs^am o^^ h^^^^^^^ ^IJ'"^°"'"'^'<' a reduced level of 
promoter sequences. Thus, in me onX ^o .e.erol oTuT. I"" °' '^^"'^•^'^ 
, structural gene is placed under me reolto v r^omlTr ^ combinations, the 

controlled by promoter sequences Thn ro,!?, 1° r 'T°'" °' "'^ 9^"^ 

and at a distance from the ^nsaip^on 'tarsi'e ha, a^n ''^'^'r"'"^ '° '^^ 9«"« 

the gene it controls in its n.,.,r„ : : !' ^PPr°''""^'9« "'e distance between the promoter and 

tolerated wimout loss of promoter funcHon '''' can be 

strucli^rSlVprtL^whrch^lr^^ ^'^r''"' °' ' ' -"o^^- '^e 

extends to the stop (TAG. TGA or TM, codon at the T. '"'"^"^ ^"^ 

lAA) codon at the 3 end. It also contains a promoter region, usually 
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Uxrated 5' or upstream to the structural gene, which initiates and regulates the expression of a stojctural 
gene. Also included <n a gene are the 3 end and poly(A) * addition sequences. " 

Structural gene .s that portion of a gene comprising a ONA segment encoding a protein polypeptide or 

. IZ Z h''° '"t«r'"*"9 "'^ 5 sequence which drives the initiation of transcriptL T e sSra 
fr^Lr^H "f"' " ""'""'"^ """" °' ^>^"'*^ i» "°' "Orally found in the celula 

b^ d r^rd rrhole "^^ ' '^'"^^ ^ g^ne. a heterologous gene m 

be denved ,n whole or ,n part from any source know to (he art, including rbTcterial genoi^e or eoisome 
eukaryofc nuclear or plasmid DNA. cDNA, viral DNA or chemically synthesized ONA A s'uc tura 
may contam one or more modifications in either the coding or the untranslated regions which cTu d affec' 

ro *e b,olog,cal act.v,ty or the chemical structure of the expression product, the rate o, express^^or the 
manner of expression control. Such modifications include, but are not limited to. mutadons insert ons 
ZTo""S '""^'"""""^ °' °' -O'^ ""='«°''<^«3. The structural gene may constituran unimerrlTed 
codmg sequence or ,t may ,nclude one or more introns, bounded by the appropriate splice juncBon The 
structural gene may be a composite of segments derived from a plurality of sources naturallv occu?rinn n! 
/5 synthetic. The structural gene may also encode a fusion protein V » sources, naturally occurring or 

5li:!!]£«£ if[12 to a DNA sequence of a structural gene that is chemically synthesized in its 

"e s?nthL T °' "^'"^ ''"^"'"'^ o'iQonucleo ide bu di 9 b c 

are synthesized using procedures known to those skilled in the art and are ligated and anneaL to 7orrn 

,0 fZl 'TT enzymatically assembled to construct the entire gene As ^ recognize? by 

har«fn m! h '" " '""'=^°'^"y structurally equivalent genes to the synthetic J^s descried 
herein may be prepared by site-specific mutagenesis or other related methods used in the art 

thatSssircoi^ratSnr ' ' - 

5 roots^nS """''"''^'^ ^"'^ undifferentiated tissues of plants, including but not limited to 

nrn nn r T^"' " '^^'^ ^""^ "^'""^ <" Culture, such as inl ce s' 

protoplasts, embryos and callus tissue. The plant tissue may be in planta or in organ, t ssue or c 1 ?lre 
Plant cell as used herein includes plant cells in planta and plFnt-^ and protoplasms in culture ' 
homology refers to identify or near identify oTn-SSii^tide or amino acid sequences As is understood in 

hvhTl!? T '^«'«<^ i" art using the tesf of cross^ 

Haml M ?^'<^^ Of Stringency as is well understood in the L (as desc ib7d n 

Hemes and Higgens (eds.) (198S) Nucleic Acid Hybridization IRL Press Oxford UKi P^nlZZT, 
Often measured in terms of perceniig?^id-i;^it7T5I^e sequences chared ° " 

toxicTOifoim^:.tl' '° "'^^ ''^"'"^ °' * P^-^"" "hich is 

hpl' 1? Z l T ^ P'o'ei" considered functionally equivalent 

thereto. As exemplified herein, both natural and synthetic Btt genes encode 65 kOa. insecttoidal o otefns 

B oenes'onf ""T" ^avinTtoxicity to coleopteran n ecrThe sS ' 

Bt genes of the present invention are not considered to be functionally equivalent to native Bt generlince 
they are expressible at a higher level in plants than native Bt genes " ^ 

fwicy of preferred codon usage refers to the prefTrence exhibited by a specific host cell in usaae ' 
0 nucleotide codons to specify a given amine acid. To determine the frequency of usa^e of a p rWcufa 
codon ,n a gene, the number of occurrences of that codon in the gene is divided by the total numhPr n 
occurrences Of all codons specifying the same amino acid in the gene. Tabte . 'for exairoives hJ 
^equency Of codon usage for 8, genes, which was obtained by analysis of four Bt g nes whose seven s ' 

c Lla? d h T ^T"'- °' "=^90 exhibiTed by a host ce i "a" 

calculated by averaging frequency of preferred codon usage ,n a large number of genes expressed bv the 
host eel It IS preferable that this analysis be limited to genes that are highly expressed by the host ce'l 

, JTl r"'"- °' "=^9« highly expressed Tenes exhibited bv 

d co yiedonous plants, and monocotyledonous plants. The dicot codon usage was caluia od usi g .5 
h ghly expressed coding sequences obtained from Genbank which are listed in Table I Monoco codon 

■nTpieT I~n%rp"r"^°' ^"'"^ ^^^"^""^ '^^ Genbank'J^drd 

When synthesizing a gene for improved expression in a host cell i, is desirable to design the gene such 
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that its frequency of codon usage approaches the frequency of preferred codon usage of the host cell. 

The percent deviation of the frequency of preferred codon usage for a synthetic gene from that 
employed by a host cell is calculated first by determining the percent deviation of the frequency of usage of 
a single codon from that of the host cell followed by obtaining the average deviation over alt codons. As 

5 defined herein this calculation includes unique codons (i.e., ATG and TGG). The frequency of preferred 
codon usage of the synthetic Btt gene, whose sequence is given in Figure 1. is given in Table 1. The 
frequency of preferred usage of the codon 'GTA' for valine in the synthetic gene (O.iO) deviates from that 
preferred by dicots (0.12) by 0.02/0.12 = 0.167 or 16.7%. The average deviation over all amino acid 
codons pf the Btt synthetic gene codon usage from that of dicot plants is 7.8%. In general terms the overall 

10 average deviation of the codon usage of a synthetic gene from that of a host cell is calculated using the 
equation 


where Xn = frequency of usage for codon n in the host cell; = frequency of usage for codon n in the 
synthetic gene. Where n represents an individual codon that specifies an amino acid, the total number of 
codons is Z, which in the preferred embodiment is 61. The overall deviation of the frequency of codon 
usage for all amino acids should preferably be less" than about 25%, and more preferably less than about 
10%. 

Derived from is used to mean taken, obtained, received, traced, replicated or descended from a source 
(chemical and/or biological). A derivative may be produced by chemical or biological manipulation 
(including but not limited to substitution, addition, insertion, deletion, extraction, isolation, mutation and 
replication) of the original source. 

Chemically synthesized, as related to a sequence of DNA, means that the corr^ponent nucleotides were 
assembled in vitro. IVIanual chemical synthesis of DNA may be accomplished using well established 
procedures (Caruthers, M. (1983) in Methodology of DNA and RNA Sequencing . Weissman (ed.), Praeger 
Publishers, New York, Chapter 1). or automated chemical synthesis can be performed using one of a 
number of commercially available machines. 

The term, designed to be highly expressed as used herein refers to a level of expression of a designed 
gene wherein the amount of its specific mRNA transcripts produced is sufficient to be quantified in Northern 
blots and, thus, represents a level of specific mRNA expressed corresponding to greater than or equal to 
approximately 0.001% of the poly(A)+ mRNA. To date, natural Bt genes are transcribed at a level wherein 
the amount of specific mRNA produced is insufficient to be estimated using the Northern blot technique. 
However, in the present invention, transcription of a synthetic Bt gene designed to be highly expressed not 
only allows quantification of the specific mRNA transcripts produced but also results in enhanced 
expression of the translation product which is measured in insectlcidal bioassays. 

Crystal protein or insecticidal crystal protein or crystal toxin refers to the major protein component of 
the parasporal crystals formed in strains of Bt. This protein component exhibits selective pathogenicity to 
different species of insects. The molecular size of the major protein isolated from parasporal crystals varies 
depending on the strain of Bt from which it is derived. Crystal proteins having molecular weights of 
approximately 132, 65. and 28ltOa have been reported. It has been shown that the approximately (32 kDa 
protein is a protoxin that is cleaved to form an approximately 65 kDa toxin. 

The crystal protein gene refers to the DNA sequence encoding the insecticidal crystal protein in either 
full length protoxin or toxin lorm. depending on the strain of Bt from which the gene is derived. 

The' authors of this invention observed that expression in plants of Bt crystal protein mRNA occurs at 
levels that are not routinely detectable in Northern blots and that low levets of Bt crystal protein expression 
correspond to this low level of mRNA expression. It is preferred for exploitation of these genes as potential 
biocontrol methods that the level of expression of Bt genes in plant cells be improved and that the stability 
of St mRNA in plants be optimized. This will al!ow~greater levels ol Bt mRNA to accumulate and will result 
in an increase in the amount of insecticidal protein in plant tissues. This is essential for the control of 
insects that are relatively resistant to Bt protein. 

Thus, this Invention is based on the recognition that expression levels of desired, recombinant 
insecticidal protein in transgenic plants can be improved via increased expression of stabilized mRNA 
transcripts: and that, conversely, detection of these stabilized RNA transcripts may be utilized to measure 
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expression of translational product (protein). This invention provides a means of resolving the problem of 
low expression of insecticidal protein RNA in plants and, therefore, of low' protein expression through the 
use of an improved, synthetic gene specifying an insecticidal crystal protein from Bt. 

Attempts to improve the levels of expression of Bt genes in plants have "entered on comparative 
5 Studies evaluating parameters such as gene type, gene length, choice of promoters, addition of plant viral 
untranslated RNA leader, addition of intron sequence and modification of nucleotides surrounding the 
initiation ATG codon. To date, changes in these parameters have not led to significant enhancement of Bt 
protein expression in plants. Applicants find that, surprisingly, to express Bt proteins at the desired level "in 
plants, modifications in the coding region of the gene were effective. Structural-function relationships can be 
10 studied using site-specific mutagenesis by replacement of restriction, fragments with synthetic DNA 
duplexes containing the desired nucleotide changes (Lo et al. (1984) Proc. Natl. Acad. Sci. 8ir2285-2289). 
However, recent advances in recombinant DNA technology now make it feasible to chemicalTy synthesize 
an entire gene designed specifically for a desired function. Thus, the Btt coding region was chemically 
synthesized, modified in such a way as to improve its expression in plants. Also, gene synthesis provides 
TS the opportunity to design the gene so as to facilitate its subsequent mutagenesis by incorporating a number 
of appropriately positioned restriction endonuclease sites into the gene. 

The present invention provides a synthetic Bt gene for a crystal protein toxic to an insect. As 
exemplified herein, this protein is toxic to coleopteran insects. To the end of improving expression of this 
insecticidal protein in plants, this invention provides a DNA segment homologous to a Btt structural gene 
20 and, as exemplified herein, having approximately 85% homology to the Btt structural geneTn p544Pst-Met5. 
In this embodiment the structural gene encoding a Btt insecticidal protein is obtained through chemical 
synthesis of the coding region. A chemically synthesized gene is used in this embodiment because it best 
allows for easy and efficacious accommodation of modifications in nucleotide sequences required to 
achieve improved levels of cross-expression. 
25 Today, in general, chemical synthesis is a preferred . method to obtain a desired modified gene. 
However, to date, no plant protein gene has been chemically synthesized nor has any synthetic gene for a 
bacterial protein been expressed in plants. In this invention, the approach adopted for synthesizing the gene 
consists of designing an improved nucleotide sequence for the coding region and assembling the gene 
from chemically synthesized oligonucleotide segments. In designing the gene, the coding region of the 
30, naturally-occurring gene, preferably from the Btt subclone. p544Pst-lvtet5, encoding a 65 kOa polypeptide 
having coleoperan toxicity, is scanned for possible modifications which would result in improved expression 
of the synthetic gene in plants. For example, to optimize the efficiency of translation, codons preferred in 
highly expressed proteins of the host cell are utilized. 

Bias in codon choice within genes in a single species appears related to the level- of expression of the 
35 protein encoded by that gene. Codon bias is most extreme in highly expressed proteins of E. coli and 
yeast. In these organisms, a strong positive correlation has been reported between the abundancTbf an 
isoaccepting tRNA species and the favored synonymous codon. In one group of highly expressed proteins 
in yeast, over 96% of the amino acids are encoded by only 25 of the 61 available codons (Bennetzen and 
Hall (1982) J, Biol. Chem. 257:3026-3031). 
40 These 25 codons are pTeferred in all sequenced yeast genes, but the degree of preference varies with 
the level of expression of the genes. Recently. Hoekema and colleagues (1987) Moi Cell. Siol. 7:2914-2924 
reported that replacement of these 25 preferred codons by minor codons in the 5 end of the highly 
expressed yeast gene PGKI results in a decreased level of both protein and mRNA. They concluded that 
biased codon choice in highly expressed genes enhances translation and is required for maintaining mRNA 
5 stability in yeast. Without doubt, the degree of codon bias is an important factor to consider when 
engineering high expression of heterologous genes in yeast and other systems. 

Experimental evidence obtained from point mutations and deletion analysis has indicated that in 
eukaryotic genes specific sequences are associated with posttranscriptional processing. RNA destabitiza- 
tion. translational termination, intron splicing and the like. These are preferably employed in the synthetic 
I genes of this invention. In designing a bacterial gene for expression in plants, sequences which interfere 
wtth the efficacy of gene expression are eliminated. 

In designing a synthetic gene, modifications in nucleotide sequence of the coding region are made to 
modify the A + T content in DNA base composition of the synthetic gene to reflect that normally found in 
genes for highly expressed proteins native to the host cell. Preferably the A + T content of the synthetic 
gene Is substantially equal to that of said genes for highly expressed proteins. In genes encoding highly 
expressed plant proteins, the A + T content is approximately 55%. It is preferred that the synthetic gene 
have an A + T content near this value, and not sufficiently high as to cause destabilization of RNA and 
therefore, lower the protein expression levels. More preferably, the A + T content is no more than about 60% 
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and most preferably is about 55%. Also, for ultimate expression in plants, the synthetic gene nucleotide 
sequence is preferably modified to form a plant initiation sequence at the 5 end of the coding region. In 
addition, particular attention is preferably given to assure that unique restriction sites are placed in strategic 
positions to allow efficient assembly of oligonucleotide segments during construction of the synthetic gene 

5 and to facilitate subsequent nucleotide modification. As a result of these modifications in coding region of 
the native Bt gene, the preferred synthetic gene is expressed in plants at an enhanced level when 
compared to that observed with natural Bt structural genes. 

In specific embodiments, the synthetic Bt gene of this invention encodes a 8tt protein toxic to 
coleopteran insects. Preferably, the toxic polypeptide is about 598 amino acids in length, is at least 75% 

10 homologous to a Btt polypeptide, and. as exemplified herein, is essentially identical to the protein encoded 
by p544Pst-Met5,'ixcept for replacement of threonine by alanine at residue 2. This amino acid substitution 
results as- a consequence of the necessity to introduce a guanine base at position +4 in the coding 
sequence. 

In designing the synthetic gene of this invention, the coding region from the Btt subclone, p544Pst- 

15 MetS. encoding a 65 kDa polypeptide having coleopteran toxicity, is scanned for possible modifications 
which would result in improved expression of the synthetic gene in plants. For example, in preferred 
embodiments,. the synthetic insecticidal protein is strongly expressed in dicot plants, e.g., tobacco, tomato, 
cotton, etc.. and hence, a synthetic gene under these conditions is designed to incorporate to advantage 
codons used preferentially by highly expressed dicot proteins. In embodiments where enhanced expression 

20 of insecticidal protein is desired in a monocot, codons preferred by highly expressed monocot proteins 
(given in Table t ) are employed in designing the synthetic gene. 

In general, genes within a taxonomic group exhibit similarities in codon choice, regardless of the 
function of these genes. Thus an estimate of the overall use of the genetic code by a taxonomic group can 
be obtained by, summing codon frequencies of all its sequenced genes. This species-specific codon choice 

25 is reported in this invention from analysis of 208 plant genes. Both monocot and dicot plants are analyzed 
individually to determine whether these broader taxonomic groups are characterized by different patterns of 
synonymous codon preference. The 208 plant genes included in the codon analysis code for proteins 
having a wide range of functions and they represent 6 monocot and 36 dicot species. These proteins are 
present in different plant tissues at varying levels of expression. 

30 In this invention it is shown that the relative use of synonymous codons differs between the monocots 
and the dicots. In general, the most important factor in discriminating between monocot and dicot patterns 
of codon usage is the percentage G + C content of the degenerate third base. In monocots, 16 of 18 amino 
acids favor G + C in this position, while dicots only favor G + C in 7 of 18 arnino acids. 

The G ending codons for Thr, Pro, Ala and Ser are avoided in both monocots and dicots because they 

35 contain C in codon position II. The CG dinucleotide is strongly avoided in plants (Boudraa (1987) Genet. 
Sel. Evol, 19:143-154) and other eukaryotes (Grantham et al. (1985) Bull. Inst. Pasteur 83:95-148). possibly 
due to regulation involving methylation. In dicots, XCG is*always the least favored codon, while in monocots 
this is not the case. The doublet TA is also avoided in codon positions tl and III in most eukaryotes, and this 
is true of both monocots and dicots. 

40 Grantham and colleagues (1986) Oxford Surveys in Evol. Biol. 3:48-81 have developed two codon 
choice indices to quantify CG and TA doublet avoidance* in codon positions II and III. XCGXCC is the ratio 
of codons having C as base II of G-ending to C-ending triplets, while XTA/XTT is the ratio of A-ending to T- 
ending triplets with T as the second base. These indices have been calculated for the plant data in this 
paper (Table 2) and support the conclusion that monocot and dicot species differ in their use of these 

45 dinucleotides. 
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Table 2 


Avoidance of CG and TA doublets in codons position ll-lll. 


XCG/XCC and XTA/'XAA values are multiplied by 100. 


Group ' 

Plants 

Dicots 

Monocots 

Maize 

Soybean 

RuBPC 
SSU 

CAB 

XCG/XCC 

40 

30 

61 

67 

37- 

18 

22 

XTMXTT 

37 

35 

47 

43 

41 

9 

• 13 


RuBPC SSU = ribulose 1 .5 bisphosphate small subunit 


CAB = chlorophyll a/b binding protein 


Additionally, for two species, soybean and maize, species-specific codon usage profiles were calculated 
(not shown). The maize codon usage pattern resembles that of monocots in general, since these sequences 
represent over half of the monocot sequences available. The codon profile of the maize subsample is even 
more strikingly biased in its preference for G+C in codon position III. On the other hand, the soybean 
codon usage pattern is almost identical to the general dicot pattern, even though it represents a much 
smaller portion of the entire dicot sample. 

In order to determine whether the coding strategy of highly expressed genes such as the ribulose 1,5 
bisphosphate small subunit (RuBPC SSU) and chlorophyll a/b binding protein (CAB) is more biased than 
that of plant genes in general, codon usage profiles for subsets of these genes (19 and 17 sequences, 
respectively) were calculated (not shown). The RuBPC SSU and CAB pooled samples are characterized by 
stronger avoidance of the codons XCG and XTA than in the larger monocot and dicot samples (Table 2). 
Although most of the genes in these subsamples are dicot in origin (17/19 and 15/17). their codon profile 
resembles that of the monocots in that G + C is utilized in the degenerate base III. 

The use of pooled data for highly expressed genes may obscure identification of species-specific 
patterns in codon choice. Therefore, the codon choices of individual genes for RuBPC SSU and CAB were 
tabulated. The preferred codons of the maize and wheat genes for RuBPC SSU and CAB are more 
restricted in general than are those of the dicot species. This is in agreement with Matsuoka et al. (1987) J. 
Biochem. 202:673-676) who noted the ;extreme codon bias of the maize RuBPC SSU gene as "well as two 
other highly expressed genes in maize leaves. CAS and phosphoenolpyruvate carboxylase. These genes 
almost completely avoid the use of A + T in codon position III, although this codon bias was not as 
pronounced in non-leaf proteins such as alcohol dehydrogenase, zein 22 kDa sub-unit, sucrose synthetase 
and ATP'ADP translocator. Since the wheat SSU and CAB genes. have a similar pattern of codon 
preference, this may reflect a common monocot pattern for these highly expressed genes in leaves. The 
CAB gene for Lemna and the RuBPC SSU genes for Chlamdomonas share a similar extreme preference for 
G + C in codon position III. In dicot CAB genes, however. A + T degenerate bases are preferred by some 
synonymous codons (e.g.. GCT for Ala. CTT for Leu. GGA and GGT for Gly). in general, the G + C 
preference is less pronounced for both RuBPC SSU and CAB genes in dicots than in monocots. 

in designing a synthetic gene for expression in plants, attempts are also made to eliminate sequences 
which Interfere with the efficacy of gene expression. Sequences such as the plant polyadenylation signals, 
e.g., AATAAA. polymerase II termination sequence, e.g.. CAN,7.9>AGTNNAA. UCUUCGG hairpins and plant 
consensus splice sites are highlighted and. if present in the native Btt coding sequence, are modified so as 
to eliminate potentially deleterious sequences. 

Modifications in nucleotide sequence of the Btt coding region are also preferably made to reduce the 
A + T content In DNA base composition. The Btt coding region has an A + T content of 64%, which is about 
10% higher than that found in a typical plant coding region. Since A + T-nch regions typify plant intergenic 
regions and plant regulatory regions. It is deemed prudent to reduce the A + T content. The synthetic Btt 
gene is designed to have an A + T content of 55%. in keeping with values usually found in plants.- 

Also, a single modification (to introduce guanine in lieu of adenine) at the fourth nucleotide position in 
the Btt coding sequence is made in the preferred embodiment to form a sequence consonant with that 
believed to function as a plant initiation sequence (Taylor et al. (1987) Mol. Gen. Genet, 210:572-577) in 
optimization of expression. In addition, in exemplilymg this invention thirty-nine nucleotides (thirteen 
codons) are added to the coding region of the synthetic gene in an attempt to stabilize primary transcripts. 
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However, it appears that equally stable transcripts are obtained in the absence of this extension polypeptide 
containing thirty-nine nucleotides. 

Not all of the above-mentioned modifications of the natural Bt gene must be made in constructing a 
synthetic Bt gene in order to obtain enhanced expression. For example, a synthetic gene may be 
5 synthesized for other purposes in addition to that of achieving enhanced levels of expression. Under these 
conditions, the original sequence of the natural Bt gene may be preserved within a region of ONA 
corresponding to one or more, but not all, segments used to construct the synthetic gene. Depending on 
the desired purpose of the gene, modification may encompass substitution of one or more, but not all, of 
the oligonucleotide segments used to construct the synthetic gene by a corresponding region of natural Bt 
10 sequence. ~ 

As is known to those skilled in* the art of synthesizing genes (IVIandeckl et al. (1985) Proc. Natl. Acad. 
Sci. 82:3543-3547; Feretti et al. (1986) Proc. Natl. Acad. Sci. 83:599-603), the DNA sequence to be 
synthesized is divided into segment lengths which can be synthesized conveniently and without undue 
complication. As exemplified herein, in preparing to synthesize the Btt gene, the coding region is divided 
fs into thirteen segments (A - M). Each segment has unique restriction sequences at the cohesive ends. 
Segment A. for example, is 228 base pairs in length and is constructed from six oligonucleotide sections, 
each containing approximately 75 bases. Single-stranded oligonucleotides are annealed and Itgated to form 
ONA segments. The length of the protruding cohesive ends in complementary oligonucleotide segments is 
four to five residues. In the, strategy evolved for gene synthesis, the sites designed for the joining of 
20 oligonucleotide pieces and DNA segments are different from the restriction sites created in the gene. 

In the specific embodiment, each DNA segment is cloned into a plC-20 vector for amplification of the 
DNA. The nucleotide sequence of each fragment is determined at this stage by the dideoxy method using 
the recombinant phage DNA as templates and selected synthetic oligonucleotides as primers. 

As exemplified herein and illustrated schematically in Figures 3 and 4, each segment individually (e.g.. 
25 segment M) is excised at the flanking restriction sites from its cloning vector and spliced into the vector 
containing segment A. fVlost often, segments are added as a paired segment instead of as a single segment 
to increase efficiency. Thus, the entire gene is constructed in the original plasmtd harboring segment A. The 
nucleotide sequence of the entire gene is determined and found to correspond exactly to that shown in 
Figure 1. 

30 In preferred embodiments the synthetic Btt gene is expressed in plants at an enhanced level when 
compared to that observed with natural Btt structural genes. To that end. the synthetic structural gene is 
combined with a promoter functional in plants, the structural gehe and the promoter region being in such 
position and orientation with respect to each other that the structural gene can be expressed in a cell in 
which the promoter region is active, thereby forming a functional gene. The promoter regions include, but 

35 are not limited to. bacterial and plant promoter regions. To express the promoter region/structural gene 
combination, the DNA segment carrying the combination is contained by a cell. Combinations which include 
plant promoter regions are contained by plant ceils, which, in turn, may be contained by plants or seeds. 
Combinations which include bacterial promoter regions are contained by bacteria, e.g.. Bt orE. coli. Those 
in the art will recognize that expression in types of micro-organisms other than bacteria mayTn some 

4Q circumstances be desirable and. given the present disclosure', feasible without undue experimentation. 

The recombinant DNA molecule carrying a synthetic structural gene under promoter control can be 
introduced into plant tissue by any means known to those skilled in the art. The technique used for a given 
plant species or specific type of plant tissue depends on the known successful techniques. As novel means 
are developed for the stable insertion of foreign genes into plant cells and for manipulating the modified 

45 cells, skilled artisans will be able to select from known means to achieve a desired result. Means for 
introducing recombinant DNA .into plant tissue include, but are not limited to, direct DNA uptake 
(Paszkowski, J. et al. (1984) EMBO J. 3:2717), electroporation (Fromm. M. et al. (1985) Proc. Natl. Acad. 
Sci. USA 82:5824). microinjection (Crossway. A. et al. (1986) Mol. Gen. Genet. 202:179). or T-DNA 
mediated transfer from Agrobacterium tumefaciens to the plant tissue. There appears toTe no fundamental 

50 limitation of T-ONA transformation to the natural host range of Agrobacterium . Successful T-DNA-mediated 
transformation of monocots (Hooykaas-Van Slogteren, G. et al. (1984) Nature 311:763). gymnosperm 
(Dandekar. A. et al. (1987) Biotechnology 5:587) and atgae (Ausich. R.. EPO applicalion 108.580) has been 
reported. Representative T-DNA vector systems are described in the following references: An. G.- et al, 
(1985) EMBO J. 4:277; Herrera-EstreHa. L. et al. (1983) Nature 303:209: Herrera-Estrella. L. et al. (l"98'3) 

55 EMBO J. 2:987; Herrera-Estrella. L. et al. (1985) in Plant Genetic Engineering . New York: Cambridge 
University Press, p. 63. Once introduced into the plant tissue., the expression of the structural gene may be 
assayed by any means known to the art. and expression may be measured as mRNA transcribed or as 
protein synthesized. Techniques are kriown for the in vitro culture of plant tissue, and in a number of cases. 
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for regeneration into whole plants. Procedures for transferring the introduced expression complex to 
commercially useful cultivars are known to those skilled In the art. 

In one of its preferred embodiments the invention disclosed herein comprises expression in plant cells 
of a synthetic insecticidal structural gene under control of a plarit expressible promoter, that is to say by 
i inserting the insecticide structural gene into T-DNA under control of a plant expressible promoter and 
introducing the T-DNA containing the insert into a plant cell using known means. Once plant cells 
expressing a synthetic insecticidal structural gene under control of a plant expressible promoter are 
obtarned. plant tissues and whole plants can be regenerated therefrom using methods and techniques well- 
known in the art. The regenerated plants are then reproduced by conventional means and the introduced 
) genes can be transferred to other strains and cultivars by conventional plant breeding techniques 

The introduction and expression of the synthetic structural gene for an insecticidal protein can be used 
to protect a crop from infestation with common insect pests. Other uses of the invention, exploiting the 
properties of other Insecticide structural genes introduced into other plant species will be readily apparent 
to those skilled .n the art. The invention in principle applies to introduction of any synthetic insecticide 
structural gene into any plant species into which foreign ONA (in the preferred embodiment T-DNA) can be 
introduced and in which said DNA can remain stably replicated. In general, these taxa presently include but 
are not limited to. gymnosperms and dicotyledonous plants, such as sunflower (family Compositeae) 
tobacco (family Solanaceae). alfalfa, soybeans and other legumes (family Leguminoseae), cotton (family 
Malvaceae), and most vegetables, as well as monocotyledonous plants. A plant containing in its tissues 
increased levels of insecticidal protein will control less susceptible types of insect, thus providing advantage 
over present insecticidal uses of Bt. By incorporation of the insecticidal protein into the tissues of a plant 
the present invention additionally provides advantage over present uses of insecticides by eliminating 
instances of nonuniform application and the costs of buying and applying insecticidal preparations to a field 
Also, the present invention eliminates the need for careful timing of application of such preparations since 
small larvae are most sensitive to insecticidal protein and the protein is always present, minimizing crop 
damage that would otherwise result from preapplication larval foraging. 

This invention combines the specific teachings of the present disclosure with a variety of techniques 
and expedients known in the art. The choice of expedients depends on variables such as the choice of 
insecticidal protein from a Bt strain, the extent of modification in preferred codon usage, manipulation of 
sequences considered to be destabilizing to RNA or sequences prematurely terminating transcription 
insertions of restnction sites within the design of the synthetic gene to allow future nucleotide modifications' 
addition of introns or enhancer sequences to the 5 and/or 3 ends of the synthetic structural gene the 
promoter region, the host in which a promoter region/structural gene combination is expressed, and the like 
As novel insecticidal proteins and toxic polypeptides are discovered, and as sequences responsible for 
enhanced cross-expression (expression of a foreign structural gene in a given host) are elucidated those of 
ordinary skill will be able to select among those elements to produce "improved" synthetic genes for 
desired proteins having agronomic value. The fundamental aspect of the present invention is the ability to 
synthesize a novel gene coding for an insecticidal protein, designed so that the protein will be expressed at 
an enhanced level In plants, yet so that it will retain Its inherent property of insect toxicity and retain or 
increase Its specific insecticidal activity. 


EXAfviPLES 

The following Examples are presented as illustrations of embodiments of the present Invention. They do 
not limit the scope of this invention, which is determined by the claims. 

The following strains were deposited with the Patent Culture Collection. Northern Regional Research 
Center. 1815 N. University Street. Peoria. Illinois 61604. 


Strain 

Deposited on 

Accession # 

E.coliMCl06l (p544-HindIII) 
Ecoli MC1061 (p544Pst-Met5) 

6 October 1987 
6 October 1987 

NRRL 8-18257 
NRRL B-li3258 


The deposited strains are provided for the convenience of those in the art. and are not necessary to 
practice the present invention, which may be practiced with the present disclosure in combination with 
publicly available protocols, information, and materials. E. coii MC1061. a good host for plasmid transforma- 
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Table 1 (CONTINUED)" 

17. Ryder, T.3. ot aj,. (1987) Mol. Gen. Genet. 2 10 : 2 1^- 
2 3 3. • ' 

18. Llewellyn, D.J. et al. (1987) J. Mol. Biol. 195: 115- 
123. 

19. Tingey, S.V. et al. (1987) EMBO J. 6:1-9. 

20. Gantt, J.S. and Key, J.L- (1987) Eur. J. Biochem. 
166 : 119-125. 

21. Guidet, F. and Fourcroy, P. (1988) Nucl . Acids Res. 
16:2336. 

22. Salanoubat, M. and Beiliard, G. (1987) Gene 60:47-56. 

23. Vololcita, M. and Somerville, C.R. (1987) J. Biol. 
Chem. 262:15825-15828. 

24. Bassner, R. et al. (1987) Nucl. Acids Res. 15:9609. 

25. Chojecki, J. (1986) Carlsberg Res. Conmun. 5^:211-217. 

26. BohlBiann, H. and Apel, K. (1987) Mol. Gen. Genet. 
207:446-454. 

27. Nielsen, P.S. and Causing, K. (1987) FEBS Lett. 
225:159-162. 

28. Higuchi, W. and Fu)ca2awa, C. (1987) Gene 55: 245-253 . 

29. Bethards, L.A. et al. (1987) Proc. Natl. Acad. Sci. 
USA 81:6830-6834. 

30. Paz-Ares, J. ai. (1987) EMBO J. 6:3553-3558. 

For example, dicots utilize the AAG codon for lysine with a frequency of 61% and the AAA codon with a 
frequency of 39%. In contrast, in Bt proteins the lysine codons AAG and AAA are used with a frequency of 
13% and 87%, respectively. It is known in the art that seldom used codons are generally detrimental to that 
system and must be avoided or used judiciously. Thus, in designing a synthetic gene encoding the Btt 
crystal protein, individual amino acid codons found in the original Btt gene are altered to reflect the codo*ns 
preferred by dicot genes for a particular amino acid. However, attention is given to maintaining the overall 
distribution of codons for each amino acid within the coding region of the gene. For example, in 'the case of 
alanine, it can be seen from Table 1 that the codon GCA is used in 8t proteins with a frequency of 50%. 
whereas the codon GCT is the preferred codon in dicot proteins. In dellgning the synthetic Btt gene, not all 
codons for alanine in the original Bt gene are replaced by GCT; instead, only some alanine codons are 
changed to GCT while others are replaced with different alanine codons in an attempt to preserve the 
overall distribution of codons for alanine used in dicot proteins. Column C in Table 1 documents that this 
goal is achieved: the frequency of codon usage in dicot proteins (column A) corresponds very closely to 
that used in the synthetic Btt gene (column C). 

In similar manner, a synthetic gene coding for insecticidal crystal protein can be. optimized for 
enhanced expression in monocot plants. In Table 1, column D, is presented the frequency of codon usage 
of highly expressed monocot proteins. 

Because of the degenerate nature of the genetic code, only part of the variation contained in a gene is 
expressed in this protein. It is clear that variation between degenerate base frequencies is not a neutral 
phenomenon since systematic codon preferences have been reported fcr bacterial, yeast and mammalian 
genes. Analysis of a large group of plant gene sequences indicates that synonymous codons are used 
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differently by monocots and dicots. These patterns are also distinct from those- reported for E. coli yeast 
and man. 

In general, the plant codon usage pattern more closely resembles that of man and other higher 
eukaryotes than unicellular organisms, due to the overall preference for G + C content in codon position III. 
5 Monocots in this sample share the most commonly used codon for 13 of 18 amino acids as that reported 
for a sample of human genes (Grantham et at. (1986 supra ), although dicots favor the most commonly used 
human codon in only 7 of 18 amino acids. 

Discussions of plant codon usage have focused on the differences between codon choice in plant 
nuclear genes and In chloroplasts. Chloroplasts differ from higher plants in that they encode only 30 tRNA 

w species.. Since chloroplasts have restricted their tRNA genes, the use of preferred codons by chloroplast- 
encoded proteins appears more extreme. However, a positive correlation has been reported between the 
level of isoaccepting tRNA for a given amino acid and the frequency with which this codon is used in the 
chloroplast genome (Pfitzinger et al. (1987) Nucl. Acids Res. 15:1377-1386). 

Our analysis of the plant genes sample confirms earlier reports that the nuclear and chloroplast 

75 genomes in plants have distinct coding strategies. The codon usage of monocots in this sample is distinct 
from chloroplast usage, sharing the most commonly used codon for only 1 of 18 amino acids. Dicots in this 
sample share the most commonly used codon of chloroplasts in only 4 of 18 amino acids. In general, the 
chloroplast codon profile more closely resembles that of unicellular organisms, with a strong bias towards 
the use of A + T in the degenerate third base. 

20 tn unicellular organisms, highly expressed genes use a smaller subset of codons than do weakly 
expressed genes although the codons preferred are distinct in some cases. Sharp and Li (1986) Nucl. Acids 
Res. r4: 7734-7749 report that codon usage in 165 E. coli genes reveals a positive con elation between high 
expression and increased codon bias. Bennetzen and Hall (1982) supra have described a simitar trend in 
codon selection in yeast. Codon usage in these tiigtity expressed genes correlates with the abundance of 

25 isoaccepting tRNAs in both yeast and E coli. It has been proposed that the good fit of abundant yeast and 
E. coH mRNA codon usage to isoacceptor tRNA abundance promotes high translation levels and high 
steady state levels of these proteins. This strongly suggests that the potential for high levels of expression 
of plant genes in yeast or E. coli is limited by their codon usage. Hoekema et al. (1987) supra report that 
replacement of the 25 most favored yeast codons with rare codons in the 5'~end of the highly expressed 

30 gene PGKI leads to a decrease in both mRNA and protein. These results indicate that codon bias should 
t>e emphasized when engineering high expression of foreign genes in yeast and other systems. 


(iii) Sequences within the 8tt coding region having potentially destabilizing influences 

35 

Analysis of the Btt gene reveals that the A + T content represents 64% of the DNA base composition 
•of the coding region. This level of A + T is about 10% higher than that found in a typical plant coding 
region. Most often, high A + T regions are found in intergenic regions. Also, many plant regulatory 
sequences are observed to be AT-rich. These observations lead to the consideration that an elevated A + 
40 T content within the Btt coding region may be contributing to a tow expression level in plants. Con- 
sequently, in designing a synthetic Btt gene, the A + T content is decreased to more closely approximate 
the A + T levels found in plant proteins. As illustrated in Table 3. the A + T content is lowered to a level in 
keeping with that found in coding regions of plant nuclear genes. The synthetic Btt gene of this invention 
has an A + T content of 55%. 

45 

. Table 3 


Adenine + Thymine Content in Btt Coding Region 


Base 

%G + C 

%A + T 

Coding Region 

G 

A 

T 

C 

Natural Btt gene 
Synthetic Btt gene 

341 
392 

633 
530 

514 
483 

306 
428 

36 
45 

64 
55 
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In addition, the natural Btt gene is scanned for sequences that are potentially destabilizing to Btt RNA. 
These sequences, when identified In the original Btt gene. are eliminated through modification of nucleotide 
sequences. Included in this group of potentially destabilizing sequences are: 

(a) plant polyadenytation signals (as described by Joshi (1987) Nucl. Acids Res. 15:9627-9640). In 

5 eukaryotes, the primary transcripts of nuclear genes are extensively processed (step^ including 5 ■ 
capping, intron . splicing, polyadenylation) to form mature and translatable mRNAs. In higher plants, 
polyadenylation involves endonudeotylic cleavage at the polyA site followed by the addition of several A 
residues to the cleaved end. The selection of the polyA site is presumed to be cis*regulated. During 
expression of Bt protein and RNA in different plants, the present inventors have observed that the 

10 polyadenylated mRNA isolated from these expression systems is not full-length but instead is truncated or 
degraded. Hence, in the present invention it was decided to minimize possible destabilization of RNA 
through elimination of potential polyadenylation signals within the coding region of the synthetic Btt gene. 
Plant polyadenylation signals including AATAAA. AATGAA. AATAAT, AATATT, GATAAA. GATAAA, and 
AATAAG motifs do not appear in the synthetic Btt gene when scanned for 0 mismatches of the sequences. 

15 (b) polymerase II termination sequence, CAN7-9AGTNNAA. This sequence was shown (Vankan and 

Filipowicz (1988) Ef^BO J. 7:791-799) to be next to the 3' end of the coding region of the U2 snRNA genes 
of Arabidopsis thaliana and is believed to be important for transcription termination upon 3' end processing. 
The synthetic Btt gene is devoid of this termination sequence. 

(c) CUUCGG hairpins, responsible for extraordinarily stable RNA secondary structures associated 
20 with various biochemical processes (Tuerk et al. (1988) Proc. Natl. Acad. Sci. 85:1364-1368). The 

exceptional stability of CUUCGG hairpins suggests that they have an unusual structure and may function in 
organizing the proper folding of complex RNA structures. CUUCGG hairpin sequences are not found with 
either 0 or 1 mismatches in the Btt coding region. 

(d) plant consensus splicT'sites. 5' = AAG:GTAAGT and 3' = TTTT(Pu)TTT(Pu)T(Pu)T(Pu)T{Pu)- 
25 TGCAG;C. as described by Brown et at. (1986) EMBQ J. 5:2749-2758. Consensus sequences for the S and 

3 splice junctions have been derived from 20 and 30 plan! intron 'sequences, respectively. Although it is not 
likely that such potential splice sequences are present in Bt genes, a search was initiated for sequences 
resembling plant consensus splice sites in the synthetic Btt gene. For the 5 splice site, the closest match 
was with three mismatches. This gave 12 sequences oF which two had G:GT. Only position 948 was 
30 changed because 1323 has the Kpnl site needed for reconstruction. The 3 -splice site is not found in the 
synthetic Btt gene. 

Thus.~by highlighting potential RNA-destabiiizing sequences, the synthetic Btt gene is designed to 
eliminate known eukaryotic regulatory sequences that effect RNA synthesis and processing. 

35 

Example 2. Chemical synthesis of a modified Btt structural gene 


(i) Synthesis Strategy 

40 

The general plant for synthesizing linear double-stranded DNA sequences coding for the crystal protein 
from Btt is schematically simplified in Figure 2. The optimized DNA coding sequence (Figure !) is divided 
into thirteen segments (segments A-M) to be synthesized individually, isolated and purified. As shown in 
Figure 2, the general strategy begins by enzymatically joining segments A and M to form segments AM to 

45 which is added segment BL to form segment ABLM. Segment C'K is then added enzymatically to make 
segment ABCKLM which is enlarged through addition of segments DJ, El and RFH sequentially to give 
finally the total segm,ent ABCDEFGHIJKLIVl. representing the entire coding region of the Btt gene. 

Figure 3 outlines in more detail the strategy used in combining individual DNA se'gments in order to 
effect the synthesis of a gene having unique restriction sites integrated into a defined nucleotide sequence. 

50 Each of the thirteen segments (A to M) has untque restriction sites at both ends, allowing the segment to be 
strategically spliced into a growing DNA polymer. Also, unique sites are placed at each end of the gene to 
enable easy transfer from one vector to another. 

The thirteen segments (A to M) used to construct the synthetic gene vary in size. Oligonucleotide pairs 
of approximately 75 nucleotides each are used to construct larger segments having approximately 225 

55 nucleotide pairs. Figure 3 documents the number of base pairs contained within each segment and 
specifies the unique restriction sites bordering each segment. Also, the overall strategy to incorporate 
specific segments at appropriate splice sites is detailed in Figure 3. 
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(ii) Preparation of oligodeoxy nucleotides 

Preparation of oligodeoxynucleotides for use in the synthesis of a DNA sequence connprising a gene for 
Btt is carried out according to the general procedures described by Matteucci et al. (1981) J. Am. Chem. 
See. 103:3185-3192 and Beaucage et al. (1981) Tetrahedron Lett. 22: 1859- 1862. "Ad oligonucleotides are 
prepared by the solid-phase phosphoramldite Iriester coupling approach, using an Applied Biosystems 
Model 380A DNA synthesizer. Oeprotection and cleavage of the oligomers fronn the solid support are 
carried out according to standard procedures. Crude oligonucleotide mixtures are purified using an 
oligonucleotide purification cartridge {OJC, Applied Biosystems) as described by f^cBrlde et al. (1988) 
Biotechniques 6:362-367. * 

5 -phosphorylation of oligonucleotides is performed with T4 polynucleotide kinase. The reaction con- 
tains 2ug oligonucleotide and 18.2 units polynucleotide kinase (Pharmacia) in linker kinase buffer {(Vlaniatis 
(1982) Cloning Manual. Fritsch and Sambrook (eds.), Cold Spring Harbor Laboratory, Cold Spring Harbor, 
NY). The reaction is incubated at 37'C for 1 hour. 

Oligonucleotides are annealed by first heating to 95*0 for 5 min. and then allowing complementary 
pairs to cool slowly to room temperature. Annealed pairs are reheated to 65 *C. solutions are combined, 
cooled slowly to room temperature and kept on Ice until used. The ligated mixture may be purified by 
electrophoresis through a 4% NuSleve agarose (FMC) gel. The band corresponding to the ligated duplex is 
excised, the ON A Is extracted from the agarose and ethano.l precipitated. 

Ligations are carried out as exemplified by that used in M segment ligations. M segment DNA is 
brought to 65 *C for 25 min. the desired vector Is added and the reaction mixture is incubated at 65* C for 
15 min. The reaction Is slow cooled over 1-1/2 hours to room temperature. ATP to 0.5m M and 3.5 units of 
T4 DNA ligase salts are added and the reaction mixture is incubated for 2 hr at room temperature and then 
maintained overnight at 15* C. The next morning, vectors which had not been ligated to M block DNA were 
removed upon linearization by EcoRI digestion. Vectors ligated to the M segment DNA are used to 
transform E. coll MC1061. Colonies containing inserted blocks are identified by colony hybridization 
with^^p.jgjjgijQ^j oligonucleotide probes. The sequence of the DNA segment is confirmed by isolating 
plasmid DNA and sequencing using the dideoxy method of Sanger et al. (1977) Proc. Natl. Acad. Sci. 
74:5463-5467. 


(Hi) Synthesis of Segment AM 

Three oligonucleotide pairs (Al and its complementary strand Ale. A2 and A2C and A3 and A3C) are 
assembled and ligated as described above to make up segment A. The nucleotide sequence of segment A 
is as follows: 
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In Table 5 bold lines demarcate the individual oligonucleotides. Segment M contains 252 base pairs and 
has destroyed EcoRI. restriction sites at both ends. (Additional restriction sites within segment M are 
indicated). Segment M is inserted into vector plC20R at an EcoRI restriction site and cloned. 

As proposed in Figure 3. segment M is joined to segment A in the plasmid in which it is contained. 
Segment IVI is excised at the flanking restrictions sites from its cloning vector and spliced into plC20K, 
harboring segment A. through successive digestions with Hindlll followed by Bglll. The plC20K vector now 
comprises segment A joined to segment M with a Hindlll site at the splice"site {see Figure 3). Plasmid 
45 plC20K is derived from plC20R by removing the Scal-Ndel DNA fragment and inserting a Hindi fragment 
containing an NPTI coding region. The resulting plasmid of 4.44 kb confers resistance to kanamycin on E. 
coli. — 


Example 3. Expression of synthetic crystal protein gene in bacterial systems 


The synthetic Btt gene is designed so that it is expressed in the plC20R-kan vector in which it is 
constructed. This expression is produced utilizing the initiation methionine of the Iac2 protein of plC20K. 
The wild-type Btt crystal protein sequence expressed in this manner has full insecticidal activity. In addition. 
55 the synthetic gene is designed to contain a Bam HI site S proximal to the initiating methionine codon and a 
Bglll site 3 to the terminal TAG translation stop codon. This facilitates the cloning of the insecticidal crystal 
protein coding region into bacterial expression vectors such as pOR540 (Russell and Bennett. 1982). 
Plasmid pOR540 contains the TAC promoter which allows the production of proteins including Btt crystal 
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protein under controlled conditions in amounts up to 10% of the total bacterial protein. This promoter 
. functions in many gram-negative bacteria including £. coli and Pseudomonas . 

Production of Bl insecticidal crystal protein from the synthetic gene in bacteria demonstrates that the 
protein produced has the expected toxicity to coleopteran insects. These recombinant bacterial strains in 
5 themselves have potential value as microbial insecticides, product of the synthetic gene. 

Example 4. Expression of a synthetic crystal protein gene in plants 

10 The synthetic Btt crystal protein gene is designed to facilitate cloning into the expression cassettes. 
These utilize sites~compatible with the Bam HI and Bglll restriction sites flani(ing the synthetic gene. 
Cassettes are available that utilize plant promoters including Caiviv 35S, CaMV 19S and the ORF 24 
promoter from T-ONA. These cassettes provide the recognition signals essential for expression of proteins 
in plants. These cassettes are utilized in the micro Ti plasmids such as pH575. Plasmids such as pH575 

;5 containing the synthetic Btt gene directed by plant expression signals are utilized in disarmed Agrobac 
terium tumefaciens to introduce the synthetic gene into plant genomic DNA. This system has been 
described previously by Adang et al. (1987) to express Bt var. kurstaki crystal protein gene in tobacco 
plants. These tobacco plants wereloxlc to feeding tobacco hornworms. 

20 

Example 5. Assay for insecticidal activity 

Bioassays were conducted essentially as described by Sekar, V. et al. supra . Toxicity was assessed by 
an estimate of the LDso. Plasmids were grown in E. coli JM)05 (Yanisch-Perron. C. et al. (1985) Gene 

25 33:103-119). On a molar basis, no significant differences in toxicity were observed between crystal proteins 
encoded by p544Pst-Met5. p544-Hindlll. and pNSBP544. When expressed in plants under identical 
conditions, cells containing protein encoded by the synthetic gene were observed to be more toxic than 
those containing protein encoded by the native Btt gene. Immunoblots ("western" blots) of cell cultures 
indicated that those that were more toxic had more crystal protein antigen. Improved expression of the 

30 synthetic Btt gene relative to that of a natural Btt gene was seen as the ability to quantitate specific mRNA 
transcriptslrom expression of synthetic Btt genes on Northern blot assays. 


Claims 

35 

1. A synthetic gene designed to be highly expressed in plants comprising a DNA sequence encoding 
an insecticidal protein which is functionally equivalent to a native insecticidal protein of St. 

2. A synthetic gene of claim l wherein said DNA sequence is at least about 85% homologous to a 
native insecticidal protein gene of Btt. 

40 3. A synthetic gene of claim. l wherein said DNA sequence is that presented in Figure 1, spanning 
nucleotides 1 through 1793. 

4. A synthetic gene of claim 1 wherein said DNA sequence is that presented in Figure l spanning 
nucleotides 1 through 1833. 

5. A synthetic gene of claim 1 wherein the overall frequency of preferred codon usage within the entire 
45 coding region of said synthetic gene is within about 75% of the frequency of codon usage preferred in 

. plants. 

6. A synthetic gene of claim 1 wherein the A + T base content of said DNA sequence is substantially 
equal to the A + T base content found in plant structural-genes. 

7. A synthetic gene of claim 1 wherein a plant initiation sequence is present at the 5 end of the coding 
50 region. 

8. A synthetic gene of claim 1 wherein plant potyadenyla-tion signals, comprising those having 
AATAAA. AATGAA. AATAAT. AATATT. GATAAA. GATAAA and AATAAG motifs, are eliminated in said DNA 
sequence. 

9. A synthetic gene of claim l wherein the polymerase II termination sequence. CANr-^AGTNNAA. is 
55 eliminated in said DNA sequence. 

10. A synthetic gene of claim 1 wherein CUUCGG hairpins are eliminated in said DNA sequence. 

11. A synthetic gene of claim I wherein plant consensus splice sites, including 5 =AAG:GTAAGT and 
3' = TTTT(Pu)TTT(Pu)T(Pu)T(Pu)T(Pu)TGCAG:C. are eliminated in said DNA sequence. 
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12. A synthetic gene of claim 1 wherein the CG and TA doublet avoidance indices are substantially 
equal to that of highly expressed genes in the selected host plant. 

13. A recombinant ONA cloning vector comprising said synthetic gene of claim 1. 

14. A plant cell which contains the synthetic gene of claim 1. 

15. An improved method of producing a protein toxic to an insect comprising the step of introducing 
into a host plant cell a ONA segment comprising a synthetic gene designed to be highly expressed in 
plants comprising a DNA sequence encoding an insecticidal protein which is functionally equivalent to a 
native insecticidal protein of Bt such that said synthetic gene is expressed in said plant host. 
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Figure 2, Simplified Scheme for Synthesis of gzz Gene 
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tions. was disclosed by Casadaban, MJ. and Cohen. S.N. (1980) J. Mol. Biol. 138:179-207. 


Exannple U Design of the synthetic insecticida! crystal protein gene. 


5 


£1 Preparation of toxic subclones of the Btt gene 

Construction, jsolation, and characterization of pNSB544 Is disclosed by Sekar. V. et al. (1987) Proc. 
10 Natl. Acad, Sci. USA 84:7036-7040, and Sekar. V. and Adang. M.J.. U.S. patent application serial no. 
108,285, filed October 13, 1987, which is hereby incorporated by reference, A 3.0 kbp Hind lll fragment 
carrying the crystal protein gene of pNSBP544 is inserted into the Hind lll site of plC-20H (Marsh, J.L et al. 
(1984) Gene 32:481-485). thereby yielding a plasmid designated p544-Hindlll, which is on deposit. 
Expression in E. coli yields a 73 kDa crystal protein in addition to the 65 kOa species characteristic of the 
IS crystal protein obtained from Btt isolates. 

A 5.9 kbp Bam HI fragment carrying the crystal protein gene is removed from pNSBP544 and inserted 
into Bam HI-llnearized plC-20H ONA, The resulting plasmid. p405/44-7. is digested with Bglll and religated. 
thereby removing Bacillus sequences flanking the 3 -end of the crystal protein gene. TheTesulting plasmid. 
p405/54-12. is digested with PstI and religated. thereby removing Bacillus sequences flanking the s'-end of 
20 the crystal protein and about 150 bp from the 5 -end of the crystal protein structural gene. The resulting 
plasmid. p405/81-4, is digested with Sph I and PstI and is mixed with and ligated to a synthetic linker having 
the following structure: 


(SD indicates the location of a Shine-Dalgarno prokaryotic ribosome binding site,) The resulting plasmid. 
p544Pst-fviet5. contains a structural gene encoding a protein identical to one encoded by pNSBP544 except 
for a deletion of the amino-terminal 47 amino acid residues. The nucleotide sequence of the Btt coding 
. region in p544Pst-Met5 is presented in Figure 1. In bioassays (Sekar and Adang, U.S. patent application 
serial no. 108,285. supra ).. the proteins encoded by the full-length Btt'gene in pNSBP544 and the N-terminal 
j5 deletion derivative, p544Pst-Met5. were shown to be equally toxic. All of the plasmids mentioned above 
have their crystal protein genes in the same orientation as the lacZ gene of the vector. 

(ii) Modification of preferred codon usage 


Table l presents the frequency of codon usage for (A) dicot proteins. (8) Bt proteins. (C) the synthetic 
Btt gene, and (D) monocot proteins. Although some codons for a particular amino acid are utilized to 
approximately the same extent by both dicot and Bt proteins (e.g., the codons lor serine), for the most part, 
the' distribution of codon frequency varies significantly between dicot and Bt proteins, as illustrated in 
J5 columns A and B in Table 1 . ~ 
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Table 1 (CONTINUED) 





Distribution Fraction 


Amino 


(A)Dicot 

(B)Bt 

(C) Synthetic 

(D)Mbnocot 

Acid 

Codon 

Genes 

Genes 

Btt Gene 

Genes 

Phe 

TTT 

0.45 

0 . 75 

0 . 44 

0.28 

Phe 

TTC 

0 . 55 

0.25 

0.56 

0.72 

Ser 

AGT 

0. 14 

0.25 

. 0.13 

0.07 

Ser 

AGO 

0. 18 

0.13 

0. 19 

0.25 

Ser 

TCG 

0.05 

0.08 

0.06 

0. 13 

Ser 

TCA 

0.18 

0. 19 

0. 17 

0.13 

Ser 

TCT 

0.26 

0.25 

0.27 

0 . 18 

Ser 

TCC 

0.19 

0. 10 

0. 17 

0.24 

Arg 

AGG 

0.22 

0.09 

0. 23 

0.28 

Arg 

AGA 

0.31 

0. 50 

0.32 

0 . 08 

Arg 

CGG 

0. 04 

0. 14 

0-05 

0 . 14 

Arg 

CGA 

0. 09 

0, 14 

0. 09 

0. 04 

Arg 

CGT 

0. 23 

0.09 

0.23 

0. 11 

Arg 

CGC 

0.11 

0.05 

0. 09 

0.36 

Gin 

CAG 

0.38 

0.18 

0. 39 

0.43 

Gin 

CAA 

0. 62 

0.82 

0.61 

0. 57 

His 

CAT 

0.52 

0.90 

0,50 

0 , 38 

His 

CAC 

0.48 

0.10 

0.50 

0.62 ' 

Leu 

TTG 

0.26 

0.08 

0.27 ' 

0.15 

Leu 

TTA 

0.10 

0.46 

0.12 

0.04 

Leu 

CTG 

0.09 

0.04 

0.10 

0.27 

Leu 

CTA 

0.08 

0.21 

0.10 

0.11 

Leu 

CTT 

0.29 

0.15, 

0.18 

0. 16 

Leu 

CTC 

0.19 

0.06 

0.22 

0.27 

Pro 

CCG 

0.07 

0.20 

0.08 

0.20 

Pro 

CCA 

0.44 

0.56 

0.44 

0.39 

Pro 

CCT 

0.32 

0.24 

0.32 

0.19 

Pro 

CCC 

0.16 

0.00 

0.16 

0.22 


Bt coding sequences publicly available and 88 coding 
sequences of dicot nuclear genes were used to compile the 
codon usage table. The pooled dicot coding sequences, 
obtained from Genbank, were; 
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GENUS/ srecits 

GCNOANK 

PKOTEIN 

KM 

Antirrtiinum fncjus 

amacus 

Glulconc s^Tithcusc 


Arabidopsis thaJia/ia 

ATHADIt 

Alcoltol dchjdrogcnisc 



ATIIIUCA 

Kistonc 3 gene 1 



ATIdUGU 

Historic 3 £cne 2 



ATIIH4CA 

I (isionc 4 gene I 



ATHLIICPI 

CAD 



ATimiDA 

A tubulin 




J^*"J'pyrw*yiHnii ji€ ^pnospnJtC 




synihcusc 


Bcnholtaia cxccUa 


High methionine storsgc protein 

2 

Braiiica campaaris 


Acyi cirrier protein 

J 

Brasiica ncpus 

hkanap 

Njpin 


Qrassica oicacea 

nOLSLSGk 

S-locus specific £l)roproictn 


Canei\^iia enufomm 

CENCONA 

Gsncanjvalm A 


Carica papayc 

CPA PAT 

Pipjin 


Chlamdontonas 




frinhardni 

CREC552 

rrejpocytochrome 



CRERDCSI 

RuOPCsnMll subunii gene t 



CRERDCS2 

. RuOrC smalt subunit gene 2 


CiiCtjrtita pcpo 

cucpirr 

PhjlOchromc 


Cttcumis icirwis 

CUSCMS 

OlyQi^osomal malaie s^ntttcuK 



CUSLHCPA 

Cad 



cusssu 

flttflP^ (mall ciihiiHii 


Ocucus canta 

OA ROT 

Ejciensifl 



OAREXTR 

33 kO citcnsin rctitcd protein 


Dctichet bifhrus 

DBILECS 

teed Icflin 


FlmKria trvtervio 

FTRBCR 

r\uui V, tmxil tUOUflll 


ChKtne max 

SOY75AA 

75 sto^^c pi vietn 



SOYACnC 




SOYCIIPI 

dt pfDtexsc inhibitor 



SOYCLYAIA 

^i^vinio OK vwvnto 



SOYGLYAAB 

Cil^rinm A^A^f^^ ctthiiAtM 
xj'^iTMfi f\j^^^^ #>UQunio 



SOYCLYAB 

Olycinin A3/^ cubuniu 



SOYCLYR 

Glycinin A20lt tubuniis 



SOYIISPPS 

M W heat shark proteins 



SOYLCOI 

1^ ^^e mogtoO in 



SOYLEA 




SOMjOX 

Upoiygenue 1 



SOYNO020C 

20 LOa nodutin 



SOYNOOUC 

23 LOa nodulin 



SO WO 014 M 

24 lOa nodulin 



SOYNODJiO 

26 lOa nodulin 



SO\'NOD2<R 

K, LOa nodulin 



SOYKOD27R 

27 LOa nodulin 



SOYNO035M 

35 kOa nodulin 



SOYN*O075 




SOYNOORt 

Nodulin Q\ 



SOYNO0R2 

^Nodulin 1127 



SOYPRTl 

Proline rich protein 



SOYRUBP 

RuOPC imall subunit 



SOYURA 

Urciie 



SOYltSP26A 

Heat shock protein 2CA 




Nuclear -encoded chloroplasi 




hexi ihocl protein 




22 LOa nodulin 

. c 



tubulin 

6 



ffl tubulin 

6 


15 


BP 0 359 472 A2 


ccsijs/ sr i:cii:s 


If^rr.nca Oca a 

Lttptttui luieus 

Lycopcnicon 

aculeiuunt 


V\Ki t l i:iN 


ayuoftinum 

Nianiaixa 

ptumbogi/iifolia 


Nicotiena tabooan 


tcncvi emtrtcana 
fionente 


PfiOSCClui vii/ffflXf 


o |'.l<rt»ul:ii (vifilin) 

,•>, I ...i..if ■...-.:..,) 

lISNKUfU.S l^uUl'C small iul)uim 

Z'S altiumin &cciJ ^lof3i;c |ir<)t<;i:; - 
Wound -indutctl C3iat3>: 

U;UIU9 CAU 

l/;ntini'C IluOrC small suOunii 

I,urLDIt IcjticmosloOln 1 

TOMDIOUR nJotin binding protein 

TOMETinUR Dhytcnc biosynthesis protein 
TOMPCJaR Potj-gaUctumnuc-li 
TO^f^SI Tomato priotos>'Stcm I proicm 

TOMRUCSA RoDPC imall lubwnit 

. TOMRBCSD UuDfC ima)) subunit 

TOMRBCSC riuBPC small subunii 

TOMRDCSD auOPC small subunii 

TOMRRD Ripening reUied protein 

TO^^^V^PlC Woond induced protciniie 

inhibitor 1 

T0.Nf^\1PM Wound induced pfxj(cinave 

inhibitor II 
CAD lA 
CaO 10 
CAU3C 
CAU< 
CAD 5 

ALFLniR Ughemoslobin III 

RuDPC small subunit 

TODATni Mitoctiondml ATP S)^tha&e 

0 subunit 

Nitrate reducusc 

Gluumtnc rynthetue 
TO0EC11 Endcchittfuu 
TOOCAPA A fubwflit ofchlorapliA C3P0 

TOOCATO 0 subunit of cliloropUst C3P0 

TOOCAPC C subunii of cttlorepUn 03PD 

TOOPRlAR P4iho^neits related protein I« 

TODPRICR PaiboscncitA-RUted protein U 

TOOPRTR PaihoccfMisis related protein lb 

TOOPXDLF Perood4te 
TOQRBPCO RuDPC small subunit 

TOOniAUR TMVHnduftd protein tiomolo^ou* 

lo thaumatin 
AVOCEL CcHuUk 

PltOCdL Chalcone ij-nihaic 

PCrCABlJ CAO 13 

PETCAnUL CA0 22L 

PETCAD22R CAOIIR 

PETCAnw CAD 25 

PETCAO)? CAO 37 

PETCA09IR CAO 91 R 

PETCHSR Qialconc synihiic 

PETCCRJ Gl)cin«-rict» protein 

PETRBCSOJ RuOPC small tubunii 

PETRUCSU RuOPC small (ubuflit 

70 iDa htat shock protein 

PHVCHM Chitinaie 

PM^TtLECA Phjiohcmaggiutinin C 

PItVDLECO Phytohem»E£lutiAin L 

PHVGSRI Gluunnne S)'ntheta« 1 

PIIVCSR2 Gl w (amine i>Tnhciai-c : 
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